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The recently developed "Data Set Diagonalization" method (DSD) is applied to measure compati- 
bility of the data sets that are used to determine parton distribution functions (PDFs). Discrepancies 
among the experiments are found to be somewhat larger than is predicted by propagating the pub- 
lished experimental errors according to Gaussian statistics. The results support a tolerance criterion 
of Ax 2 ~ 10 to estimate the 90% confidence range for PDF uncertainties. No basis is found in the 
data sets for the much larger A\ 2 values that are in current use; though it will be necessary to 
retain those larger values until improved methods can be developed to take account of systematic 
errors in applying the theory. The DSD method also measures how much influence each experiment 
has on the global fit, and identifies experiments that show significant tension with respect to the 
others. The method is used to explore the contribution from muon scattering experiments, which 
are found to exhibit the largest discrepancies in the current fit. 



1. INTRODUCTION 

Interactions at high energy colliders such as the 
Tevatron and LHC are interpreted according to Quan- 
tum Chromodynamics (QCD) and Elcctrowcak theory 
on the basis of collisions between partons. The analysis 
of collider data therefore relies on knowing the parton 
distribution functions (PDFs) that describe probabil- 
ity densities for the gluon (g) and quark partons (u, d, 
c, s, b) and their antiquarks (u, d, c, s, b) in the proton, 
as a function of momentum fraction x and QCD fac- 
torization scale (i. In keeping with their importance, 
there is a sizeable industry in attempting to determine 
the PDFs 0, 0,110. " 

An integral part of the PDF effort is the need to esti- 
mate the uncertainty of the results. An obvious com- 
ponent of that uncertainty comes from the reported 
errors in the data. It has become standard prac- 
tice 0, 1 3 t° inflate the uncertainties obtained in 
this way, motivated in part by a notion that disagree- 
ments between different experiments in the global fit 
signal the presence of unknown systematic errors in 
the experiments — or else they indicate important sys- 
tematic errors in the theory, which could for example 
be introduced by the pcrturbative approximations to 
QCD. 

The recently-invented method of Data Set Diago- 
nalization (DSD) Q offers a direct assessment of the 
contribution from each experiment to the global fit, 
and provides a statistical measure of the consistency 
between each experiment and the others. The method 
was illustrated in [|| by applying it to three of the ex- 
periments in a contemporary PDF analysis [||. That 
study is extended in this paper to systematically ex- 
amine the contribution and consistency of every ex- 
periment in the analysis. 



The DSD study is important for two reasons. First, 
the overall level of consistency among the experiments 
provides quantitative information on how to assign un- 
certainty estimates to predictions based on the global 
fit. Second, the study identifies experiments whose 
implications are in disagreement with the consensus of 
the others, due to unknown theoretical or experimen- 
tal problems. 



2. THE PDF FITTING PARADIGM 

In current practice, one attempts to determine u(x), 
u(x), d(x), d(x), s(x), s(x), g{x) at some low QCD 
scale fiQ. The distributions at all higher scales are 
then given by the QCD renormalization group DGLAP 
equations. The c and b distributions are generally as- 
sumed to arise only from this perturbatively calcu- 
lable evolution in fi; and available data are consis- 
tent with s(x) = s(x). This leaves 6 unknown func- 
tions of x to be determined from experiment. These 
functions are further constrained theoretically only by 
the number sum rules f (u(x) — u(x))dx = 2 and 

Jq 1 (d(x) — d(x)) dx = 1, the momentum sum rule, some 
theoretical predictions on limiting behavior at x — > 
and x — ► 1, positivity, and notions of expected smooth- 
ness. 

In the paradigm used here, the parton distribu- 
tions at [1q are expressed as functional forms in x, 
with a large number of adjustable parameters. The 
parameter values are determined by a "global analy- 
sis" in which data from a wide variety of experiments 
are fitted simultaneously No single experiment di- 
rectly measures any one the basic distributions; but 
the workings of QCD tie each data point to a different 
convolution integral over the distributions, and hence 
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to a different combination of the unknown parameters. 



3. THE DSD METHOD 

This Section summarizes the DSD method @. As 
a further aid to understanding it, the method is illus- 
trated in the Appendix by a simple explicit example. 

The x 2 measure of the quality of fit to the full body 
of data is a function of parameters a\,. . . , a n that de- 
fine the parton distributions at scale [Aq (here n — 24 
and = 1.4 GeV). The best-fit PDF set is found 
by minimizing \ 2 with respect to those parameters. 
The uncertainty range of the fit is estimated as the 
region in parameter space that is sufficiently close to 
this minimum: x 2 < Xmin + ^-X 2 ■ 

The dependence of x 2 on {ai} can be expanded 
about the minimum through second order using Tay- 
lor series. The eigenvectors of the quadratic form that 
governs that expansion can be used as basis vectors to 
obtain a linear transformation to new coordinates for 
which 

n 

x 2 = xLn + Y,y? ■ w 

i=l 

This is known as the Hessian method Q. The DSD 
method [y] builds upon it by simultaneously diago- 
nalizing x 2 an d the contribution to x 2 from a subset 
of the data, such as a single one of the experiments. 
This is done as follows. Let xi ^ c the contribution 
to x 2 from the subset. In the neighborhood of the 
global minimum, xi can be expanded through second 
order in the coordinates {yi} by again using Taylor 
series. The eigenvectors of the second- derivative ma- 
trix which appears in that expansion provide a further 
linear transformation which diagonalizes its quadratic 
form, without spoiling Eq. |T|) . Combining the two lin- 
ear transformations yields a single linear transforma- 
tion of the fitting parameters {ai} to new parameters 
{z,} for which 

n 

X 2 = Xi +X§ = J2 Z ? ( 2 ) 

i=l 

n 

xi = X)7< 0* " A i? + const (3) 

i=l 
n 

X| = £ (1 - 7i) (* - df + const , (4) 

i=l 

where 7 ? A; + (1 — 7i)Cj = 0. Assuming that < 



ji < 1, Eqs. ([3H4]) can be written in the form 
X i = E(^) 2 + const 

xf = X [ D . J + const . ( 5 ) 

which cries out to be be interpreted as independent 
measurements: 

S: Zi — Ai i Bi 

S: z t = d ± Di . (6) 

The parameters 7i determine the precision of these 
measurements through 

Bi = 

Di = 1/Vl - 7» ■ (7) 

In the PDF analysis, the largest values of 7* that ap- 
pear are ~0.9 . Most of the 7$ are smaller than that, 
since most properties of the global fit are significantly 
constrained by more than one experiment — both be- 
cause different kinds of experiments are strongly linked 
by QCD, and because many of the key measurements 
have been made more than once (often by more than 
one experimental group). The study in Sec. [H reports 
all of the results with 7, > 0.1. Directions for which 
7i < 0.1 can be neglected, since for these directions, 
the uncertainty of S is at least 3 times larger than 
the uncertainty from the other experiments, so it con- 
tributes little to the weighted average. In practice 
some 7j even come out negative. When that happens, 
it indicates that S is so insensitive to in the allowed 
range \zi\ < 1 that the quadratic approximation has 
broken down for that experiment along that direction. 
Since S is insensitive to Zi along such directions, it is 
correct to ignore them along with the other directions 
for which 7.; < 0.1 . 

The new coordinates are chosen such that the av- 
erage of the two measurements weighted by their 
uncertainties, gives 

Zi = ± 1 (8) 

according to Eq. ([2]). The difference between the two 
measurements ([5]) provides a direct measure of the con- 
sistency between S and its complement S. That dif- 
ference can be expressed in standard deviations as 



a t = ' l\ 2 = y/v (l- 7 i)l^ - Ci\ . (9) 

V B i +L> i 

The parameter 7^ characterizes the importance of 
experiment S, while the parameter ai characterizes its 
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consistency with S. along direction Zj. In the next 
Section, these key parameters arc evaluated for every 
experiment in the PDF global fit. 



4. RESULTS FROM THE DSD METHOD 

We study a body of input data that is nearly the 
same as was used in the recent CT09 analysis Q • The 
parametrization of the PDFs is identical to CT09, with 
the same 24 free parameters. The definition of x 2 
used here is just the sum over data points of ((data- 
theory) /error) 2 , except for including correlated sys- 
tematic experimental errors for all data sets for which 
these have been published. Unlike in CT09, no weight 
factors or penalties arc applied in x 2 to emphasize par- 
ticular experiments. 

The centerpiece of this study is presented in Tabled 
which lists all of the measurements (71,04) that pass 
the importance criterion 74 > 0.1 . The parameter 
ji measures the importance of the experiment under 
study in determining the result z, ; = of the global fit, 
while crj measures the discrepancy between that ex- 
periment and the consensus of the others. One must 
keep in mind that to generate this table, the DSD 
method had to be applied separately for each exper- 
iment. Hence the definition of the coordinates Zi is 
different for each line in the table. The measurements 
from each data set are listed in descending order of 
7i, so in each case, i = 1 labels the parameter that is 
measured best by the experiment under study. 

The data sets in Table Q] are grouped according to 
their initial-state particles. These groupings involve 
different experimental techniques, and even different 
laboratories. The ep results from HERA (Hl+ZEUS) 
cover similar kinematic regions using similar tech- 
niques, but they are listed separately to satisfy possible 
curiosity. From a theoretical point of view, ep — ► eX 
and fip — > [iX deep inelastic scattering (DIS) mea- 
surements are equivalent. However, the \xp data are 
from fixed-target experiments that cover a different 
kinematic region from the ep experiments, as will be 
discussed in Sec. [51 

In addition to the 29 data sets listed in Table []] the 
fit includes the 8 data sets listed in Table ITT1 which con- 
tribute no information of importance 7^ > 0.1 . These 
are all HERA experiments with relatively low statis- 
tics. 

TablcUshows that the HERA data contribute a sub- 
stantial portion of our knowledge on PDFs. However, 
it also shows major contributions from fixed-target \ip 
and fid DIS experiments — not surprisingly, because of 



their high statistics, and because the deuterium target 
measurements help to differentiate among quark fla- 
vors. There are also major contributions from Drell- 
Yan (DY) lepton pair production on fixed targets; from 
Tevatron pp inclusive jet experiments and the forward- 
backward lepton asymmetry from W decay; and from 
neutrino experiments. 

It is shown in Q that 74 can be interpreted as the 
fraction of the global measurement Zi = ± 1 that is 
contributed by the data in S. The column listing ji 
in Table U can therefore be thought of as the number of 
fitting parameters that are determined by the experi- 
ment in question. Totaling these for each experimen- 
tal category, we find that HI and ZEUS experiments 
combined effectively measure 6.2 parameters; fip ex- 
periments measure 5.8; DY experiments measure 5.0; 
neutrino experiments measure 3.5; and Tevatron ex- 
periments measure 2.7. The sum of these numbers is 
23.2, which is satisfyingly close to the actual number 
n = 24 of parameters that were fitted. The fact that 
all of these types of experiment are needed to get the 
best information on PDFs has long been believed; but 
it is established here quantitatively for the first time. 
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FIG. 1: Results from Table [I] ep (daisy); fip,fid (O); 
pp,pd,pCu (□); pp (V); vA (A). 



The results in Table U arc displayed graphically 
in Fig. Q] This plot shows that the effective mea- 
surements are widely distributed in the (7,c) plane. 
Broadly speaking, all of the experiment types con- 
tribute to all parts of the plot, with one possible excep- 
tion that is explored in Sec. [H Smaller values of 7 are 
more common because most aspects of the fit are con- 
strained by more than one experiment. Smaller values 
of a arc more common because the fit is reasonably 
self-consistent. The distribution of a is examined in 
detail in the next Section. 
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Process 


Expt 


N 


Y2i 7s 


(7i> o-i), (72, 0-2), ■ ■ ■ 


e+ p -> e+ X 


HI NC 


[10J 


115 


2.10 


(0.72, 


0.01) (0.59, 3.02) (0.43, 0.20) (0.36, 1.37) 


e~ p — > e~ X 


HI NC 


[lil 

1 J 


126 


0.30 


(0.30, 


0.02) 


e + p^e + X 


HI NC 


[121 


147 


0.37 


(0.21, 


0.06) (0.16, 0.83) 


e + p^e+ X 


HI CC 


[131 


25 


0.24 


(0.24, 


0.00) 


e~ p — > v X 


HI CC 


[111 


28 


0.13 


(0.13, 


0.00) 


e+p^ e+X 


ZEUS NC 


[14J 


227 


1.69 


(0.45, 


3.13) (0.42, 0.32) (0.35,3.20) (0.29, 0.80) (0.18, 0.64) 


e+p^ e+X 


ZEUS NC 


[15] 


90 


0.36 


(0.22, 


0.01) (0.14, 1.61) 


e + p — > v X 


ZEUS CC 


[16] 


29 


0.55 


(0.55, 


0.04) 


e + p — > v X 


ZEUS CC 


[17] 


30 


0.32 


(0.32, 


0.10) 


e~ p — > v X 


ZEUS CC 


[18] 


26 


0.12 


(0.12, 


0.02) 


fip -> /iJf 


BCDMS F 2 p [19] 


339 


2.21 


(0.68, 


0.50) (0.63, 1.63) (0.43, 0.80) (0.34,4.93) (0.13, 0.94) 


/id — > (i X 


BCDMS F 2 d [20] 


251 


0.90 


(0.32, 


0.67) (0.24, 2.49) (0.19, 2.09) (0.16,5.22) 


fip — > pX 


NMC F 2 p 


[21] 


201 


0.49 


(0.20, 


4.56) (0.17,4.76) (0.12, 0.50) 


pp/d — » p X 


NMC i^p/d [21] 


123 


2.17 


(0.61, 


1.11) (0.56,3.60) (0.43, 0.90) (0.36, 0.79) (0.21, 1.41) 


pCu — » (i + n~ X 


E605 


[22] 


119 


1.52 


(0.91, 


1.29) (0.38, 1.12) (0.23, 0.31) 


pp,pd — » p + /t _ X 


E866 pp/pd 


[23] 


15 


1.92 


(0.88, 


0.57) (0.69, 1.15) (0.35, 1.80) 


pp — » p + p~ X 


E866 pp 


[24J 


184 


1.52 


(0.75, 


0.04) (0.39, 1.79) (0.23, 1.94) (0.14,3.57) 


pp^ (W->&/)X 


CDF Wasy 


[25] 


11 


0.91 


(0.57, 


0.33) (0.34, 0.51) 


pp-> (W^£u)X 


CDF Wasy 


[26] 


11 


0.16 


(0.16, 


2.84) 


pp — » jet X 


CDF Jet 


[8j 


72 


0.92 


(0.48, 


0.47) (0.44,3.86) 


pp -> jetX 


DO Jet 


[9] 


110 


0.68 


(0.39, 


1.70) (0.29, 0.76) 


i/Fe -> >il 


NuTeV F 2 


[27J 


69 


0.84 


(0.37, 


2.75) (0.29, 0.42) (0.18, 0.97) 




NuTeV F 3 


[28] 


86 


0.61 


(0.30, 


0.50) (0.16, 1.35) (0.15, 0.30) 


1/ Fe -> pX 


CDHSW 


[29] 


96 


0.13 


(0.13, 


0.04) 


1/ Fe -> pX 


CDHSW 


[29] 


85 


0.11 


(0.11, 


1.32) 


i^Fc — > p + p~X 


NuTeV 


[30] 


38 


0.68 


(0.39, 


0.31) (0.29, 0.66) 


vFe — > p + p~X 


NuTeV 


[30] 


33 


0.56 


(0.32, 


0.18) (0.24, 2.56) 


vFc — » p + p~X 


CCFR 


[30] 


40 


0.41 


(0.24, 


1.37) (0.17, 0.12) 


uFo — » /x + p~X 


CCFR 


[30] 


38 


0.14 


(0.14, 


0.79) 



TABLE I: Experiments in the PDF fit that provide at least one measurement with 7; > 0.1 . Large discrepancies (oi > 3) 
are shown in boldface. 



5. DISTRIBUTION OF THE 
DISCREPANCIES 

According to Gaussian statistics, the 68 discrepan- 
cies {<Ti} listed in Table Q] would be expected to follow 
the normal distribution 

dP I 1 

= V 2^ CXp( - CT2/2) ' (10) 

A histogram of the actual distribution is shown in Fig. 
[2 together with that prediction. The distribution is 
clearly broader than the prediction. Hence, the ob- 
served inconsistencies among the data sets are larger 
than what is predicted by Gaussian statistics. This 
can also be seen from the number of "outliers:" 10 
measurements out of 68 in Table Q] have oi > 3 . The 



probability for so many large values to arise by random 
fluctuations from the distribution (fT0|) is vanishingly 
small — even 5 instances of |cr^| > 3 in 68 tries is a 
million-to-one long shot. 

When it is necessary to combine experimental re- 
sults that lie outside a comfortable range of statistical 
agreement, a standard course of action is to scale up 
the errors — see, e.g., the Particle Data Group tables 
in (3?| • That approach suggests fitting the histogram 
in Fig. [2] to a Gaussian form with adjustable width: 

A maximum-likelihood fit to this form yields c = 1.88 . 
This suggests that the errors in the PDF fit need to 
be scaled up by nearly a factor of 2 to allow for the 
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Process 


Expt 


N 


e~ p — * e~ X 


HI NC [12] 


13 


e + p u X 


HI CC [12] 


28 


e + p — > cX 


HI F 2 C [31J 


8 


e + p — > cl 


HI F 2 C [32, 33] 


10 




HI ^ [32, 33] 


10 


e~ p — > e~ X 


ZEUS NC [34] 


92 


e + p — > ccX 


ZEUS F 2 C [35J 


18 


p — > cdX 


ZEUS F 2 C [36] 


27 



TABLE II: Experiments in the PDF fit with no measure- 
ments with 7^ > 0.1. 
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FIG. 2: Distribution of the discrepancies <Ji from Table [T] 
The solid curve is the parameter-free Gaussian prediction 
(|10[) . The long-dashed curve is a fit to the scaled Gaussian 
form (|11|) . The short-dashed curve is a fit to the squared- 
Lorentzian form 



observed inconsistencies among the data sets. This fit 
is also shown in Fig. [5] 

Although the scaled Gaussian is an improvement 
over the absolute one, the fit it provides is not en- 
tirely satisfactory. A much better description of the 
histogram can be obtained using a form with a more 
slowly falling tail, such as the squared-Lorentzian: 



dP 
da 



2m 3 /Tr 

(<7 2 + TO 2 ) 5 



(12) 



This curve is also shown in Fig. [5J using the param- 
eter value m = 2.17 obtained by maximum-likelihood 
fitting. 



7, (1 = 1 only) 



i-n— i 
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A 



7- (i = 3,4,5) 



: FPl-m .]. 





FIG. 3: Distribution of the discrepancies Oi from Table 
[T] for measurements with: (a) i = 1 only; (b) i = 3, 4, 5; 
(c) 0.10 < 7, < 0.25; (d) ji > 0.40 . Curves shown are the 
absolute Gaussian and squared-Lorentzian curves from Fig. 
[2l normalized to the number of points in each histograms, 
but not refitted. 



The distribution of discrepancies seen in Fig. [2] ap- 
pears to be a general characteristic of the global fit. 
This is demonstrated by Fig. [31 which shows his- 
tograms for various subsets of the (7^,0^) pairs from 
Tabic UJ (a) those with i = 1, i.e., the best-measured 
parameter from each experiment; (b) those with i = 
3,4,5, i.e., less well-measured parameters from each 
experiment; (c) those with 0.10 < 7, < 0.25, i.e., pa- 
rameters that are weakly determined by the experi- 
ment under study; and (d) those with 7, > 0.40, i.e., 
parameters that are strongly determined by the ex- 
periment under study. (The middle ranges — i = 2 in 
(a) and (b), 0.25 < 7 4 < 0.40 in (c) and (d) — are ex- 
cluded from these histograms in an attempt to accen- 
tuate any systematic differences.) As far as can be 
seen with the limited statistics, these distributions all 
look alike. They are all inconsistent with the absolute 
Gaussian prediction, and they are all consistent with 
the squared-Lorentzian form, whose width parameter 
m = 2.17 is kept the same as in Fig. [2] 

The only systematic trend that is suggested by Fig.Q] 
is a tendency for the muon experiments to have larger- 
than-average discrepancies. That trend is explored in 
the next Section. 
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6. ROLE OF THE MUON EXPERIMENTS 



7. NON-GAUSSIAN STATISTICS AND A X 2 



Figure Q] (or Tabic QJ shows that the four largest dis- 
crepancies <7j all come from the fj,p and fid fixed-target 
BCDMS and NMC experiments. This is perhaps not 
surprising, since a significant tension between those 
experiments and the rest of the global fit was already 
observed in CTEQ5 [38[, using the less sophisticated 
method of plotting \s vs - xi |39| • Tension between the 
NMC and BCDMS data sets can also be inferred from 
a recent MSTW paper [4(| , which shows that the two 
experiments prefer values of a s (mz) that differ signif- 
icantly, in opposite directions, from the ap pro ximate 
world-average value 0.118 that is used here [371 ]. 

Figure [4] shows the histogram of Ui for the muon 
experiments and the others separately. The muon his- 
togram looks quite different: it contains zero counts 
in the first bin and all four of the counts with a > 4. 
It is therefore natural to raise the question of whether 
some or all of the muon experiments — or their theo- 
retical treatment — may contain important systematic 
errors that have been neglected. This question is ex- 
plored in detail in Sec. O 

The essential question from the standpoint of this 
paper is whether or not the deviation from Gaussian 
behavior seen in Fig. [2] is a general characteristic of 
the PDF fits, or whether it could instead just point 
to problems with the muon data sets. The right hand 
side of Fig. [4] appears to show that the non-muon data 
also have a tail at large a which is inconsistent with the 
ideal Gaussian curve. However, if one speculates that 
the muon data or their theoretical treatment may be 
incorrect, then those large-er points might merely re- 
flect a conflict with the muon data. A direct way to 
proceed is to repeat the analysis that led to Table [H 
with all four of the muon experiments omitted from 
the global fit. In carrying this out, it was necessary 
to reduce the number of fitting parameters from 24 to 
21 in order to obtain stable fits to the reduced input 
data. Results from this study are shown in Fig. [5l The 
distribution is again broader than the absolute Gaus- 
sian prediction, so the central conclusion from Sec. [5] 
stands even if the muon data are excluded. Because 
no extreme outlying points appear in this histogram, 
a rescaled Gaussian (c = 1.70 in Eq. dTTJ) ) this time 
gives an acceptable fit. It is even slightly better than 
a squared-Lorentzian fit (m = 2.51 in Eq. (fT2"| ). 

The DSD method can be used to further explore the 
contribution of specific experiments to the global fit. 
This is pursued for the muon experiments in Sec. [9] 



This Section explains the concept of A% 2 and es- 
timates it for the PDF fit. Let us consider a simple 
scenario that is similar to the DSD situation of measur- 
ing a single variable Zj in the symmetric case 7j = 0.5 . 
Specifically, suppose a quantity z is measured by two 
equally-trustworthy experiments, which report 



Expt 1: z = A ± V2 
Expt 2: z = B ± V2. 



(13) 



We wish to combine these two measurements into a 
single result. According to standard Gaussian statis- 
tics, that is done by taking the average and combining 
the errors in quadrature: 



z = {A + B)/2 ± 1 



The x 2 measure of fit quality is 



z — A 



V2 



z-B 



V2 



A + B 



A-B 



(14) 

(15) 
(16) 



The algebraic rearrangement in Eq. (|16p reveals that 
the expected best-fit value z = (A + B)/2 indeed min- 
imizes x 2 , and that the error limits in (|14[) correspond 



to the points where x 



X 2 - 

A- mm 



Ax with Ax = 1. 



This corresponds to the 68.3% confidence limit, i.e. 
"1 o\" The uncertainty limit for 90% confidence is far- 
ther from the minimum in z by a factor 1.64, which 
corresponds to A% 2 = 2.71 . 

Now let us see what happens if we do not assume 
that the errors are Gaussian. Suppose instead that 
the measurements A and B arise from independent 
random processes with probability distributions 



dP_ 

dA 
dP 
dB 



f(A) 
f(B) 



(17) 
(18) 



where J_ f(A) dA = 1, and we assume for simplicity 
that A and B come from the same distribution. We 
can assume without loss of generality that this dis- 
tribution is centered about a true answer of 0. Let 
us also assume that the distribution is symmetric: 
f(A) = f(—A). It is intuitively clear that the best es- 
timate from the two measurements will remain equal 
to the average (A + B)/2, so the real issue is how to 
assess the uncertainty on that result. 
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FIG. 4: Distribution of the discrepancies at from Table [T] Left panel: pip and fid experiments only. Right panel: all 
experiments except fj,p and fid. Curves are the absolute Gaussian prediction and the squared-Lorentzian fit — both the 
same as shown in Fig. [3] except for normalization. 




FIG. 5: Distribution of the discrepancies <7i from global fits 
with the fip data omitted. Solid curve is the parameter- 
free Gaussian prediction (|10p . Long dash and short dash 
curves are new fits to scaled-Gaussian and squared- 
Lorentzian (|12[) forms respectively. 



If the two measurements can be repeated many 
times, the probability distribution for their average 
(A + B)/2 is given by 

dP 

-Tz = Pz{z) 



oo 
oo 



= 2 



dAf(A) 
dAf(A)f(2z-A) 



dBf(B) S\ - 



(19) 



Meanwhile, the probability distribution for the differ- 
ence (A— B) between the two measurements, expressed 

in units of its error yj (\/2) 2 + (\/2) 2 = 2, is given by 



dP 
da 



dAf(A) I dB f(B) 5{^—^- - 



/OO 
dAf(A)f(A-2a) 
-OO 



(20) 



Comparing Eqs. (jT9l l20f and using the assumed sym- 
metry f(A) = f(—A), we obtain 



Pz{z) = p a {z) . 



(21) 



Eq. H21\) shows that the uncertainty distribution for 
the average of the two measurements is the same as 
the uncertainty distribution for their difference, when 
that difference is normalized by its error as is done 
here. The former is what is needed to estimate the 
uncertainties of the PDF results, while the latter is 
what is measured in the histogram of Fig. [2l 

Before proceeding, let us check that the above for- 
mulae reproduce the correct results in the Gaussian 
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case f{A) = exp(-A 2 /4)/v47r. In that case, Eqs. (JTH 



and (J2T) give p z (z) 



/ Z7r and Pa{p) 



exp(-a 2 /2)/V2n- Thus dP/dz and dP/dcr are both 
Gaussians of width 1, which indeed agrees with the 
standard rules for propagating the uncertainties from 
Eq. H3]). The middle 68.3% (90%) of the probability 
distribution dP/dz is contained in \z\ < 1.00 (1.64), 
which corresponds to the points where A\ 2 = 1-00 
(2.71), in agreement with earlier statements. 

If the distribution of differences, and hence accord- 
ing to (f2Tj) the distribution of averages, is given by 
the scaled Gaussian form (fTTj). then the uncertainty 
limits in z are scaled by the parameter c in that for- 
mula. Hence the 90% confidence tolerance becomes 
Ax 2 = 2.71 c 2 . For the value c = 1.88 found in Sec. [5] 
from the fit in Fig. [2j this implies Ax 2 = 14 . 

If, on the other hand, the distribution of differ- 
ences, and hence the distribution of averages, is given 
by the squared-Lorcntzian form (|12| with the width 
parameter m = 2.17 that was found in Sec. [5] by 
fitting the distribution of differences, then the cen- 
tral 68.3% (90%) of the distribution is contained in 
\z\ < 1.50 (2.95), which corresponds to Ax 2 = 2.25 
(Ax 2 = 8.70). Note that the ratio between 68.3% 
and 90% confidence points is larger for the squarcd- 
Lorentzian distribution (8.70/2.25 = 3.9) than for the 
Gaussian distribution (2.71/1.00 = 2.7), because of 
the relatively slowly-falling tail of the Lorentzian. 

These results suggest that the 90% confidence cri- 
terion for the uncertainty of the global fit is given by 
A X 2 « 10. 



8. REMARK ON x 2 /N 

The overall Xtotai (= 3074) for the global fit is not far 
from the total number of data points (A to tai = 2970) in 
the fit. This at first seems to contradict the idea that 
there are inconsistencies in the fit that are nearly twice 
the expectation based on the experimental errors. For 
example, in the extreme, if the actual errors for all 
of the data points were a factor of 2 larger than the 
errors claimed by the experiments, we would expect 

Xtotai /^total « 4. 

However, the actual situation does not correspond to 
that extreme. A given experiment with N data points 
delivers significant information along only a few direc- 
tions in parameter space — at most 5 or 6 according to 
Tablc|TJ Of those directions, there is significant discord 
along at most 2 or 3. We can estimate the effect of this 
on x 2 /iV as follows. Eq. ([5]) shows that the lowest pos- 
sible x 2 f° r experiment S occurs at {zi = Aj}, while 



the global best-fit value occurs at {zi = 0}. Hence in 
the global fit, %| lies above its best-fit value by 



E 

i=l 



(22) 



Combining Eqs. ([5H7]) and using 7jA, + (1— 7i)C, 
which follows from Eq. |T]), wc obtain 



A, 
Bi 



Hence the addition to x 2 from experiment S is 



E 

i=l 



£(l-7«)°« a 



(23) 



(24) 



Adding this up over the 68 (7,, <Xj) pairs with 7, > 0.1 
in Table [T] gives a total of 168 . The full data set has 
2970 points, so this contribution of w 168 to Xtotai ^ s 
not large enough to spoil the expectation that Xtotai ~ 

-Ntotal ± V2 Atotal- 




FIG. 6: Kinematic region covered by the lepton DIS ex- 
periments ep — > eX (HI = A, ZEUS = V) and fj,p — > [iX 
(BCDMS=D, NMC = o). 



FURTHER STUDY OF THE MUON 
EXPERIMENTS 



The largest tensions in the current PDF fit in- 
volve the four muon-initiated fixed-target experiments 
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(fxp — > pX and pd — ► /x^T measured by both BCDMS 
and NMC), as noted in Sec. [H The kinematic regions 
covered by p,p —> p,X and ep — > eX experiments are 
shown in Fig. [BJ There is considerable overlap be- 
tween the BCDMS and NMC experimental regions; 
but BCDMS extends farther toward x = 1 , while 
NMC extends farther toward small x and small Q. 
Hence it is possible that the four muon experiments 
each measure different quantities. Meanwhile, the HI 
and ZEUS regions overlap completely with each other, 
and hardly at all with the muon experiments. 

The observed tension involving the muon exper- 
iments could arise from inconsistencies within each 
muon data set, or disagreements between them, or 
disagreements between them and the non-muon ex- 
periments. The DSD method is an excellent tool to 
sort this out. 

The first four lines of Table IIIII show results from 
applying the DSD method to four new global fits, in 
which the experiment listed is the only one of the muon 
experiments included in the fit. We see that large 
discrepancies — signaled by large ui — remain for the 
two fip experiments. Those discrepancies arc further 
indicated by elevated values of 'X 2 /N: 384/339 and 
332/201. This analysis removes the effect of any pos- 
sible tension between the various muon experiments, 
so the discrepancies must be internal to each fip data 
set, or else they reflect a conflict with the non-muon 
data. 

Tension can also be created by insufficient flexibility 
in the functional forms that are used to approximate 
the PDFs at QCD scale po. In order to investigate this 
possible "parametrization dependence," new fits were 
carried out in which two additional free parameters 
were introduced. The new parameters were added in 
u v (x) = u{x) — u{x) and d v {x) = d(x) — d(x), since 
these valence quark distributions dominate at large x, 
where the muon experiments arc important according 
to Fig. [6] The second group of four lines in Table 
IIIII shows the DSD results for these fits, where again 
only one muon experiment is included in each fit. The 
additional freedom produces a better agreement with 
the BCDMS pp experiment, as \ 2 drops from 384 to 
365 for 339 data points. However, the cr, show that 
substantial tensions remain for all four of the muon 
experiments. 

The third group of four lines in Table IIIII shows re- 
sults from a fit that once again includes all four muon 
experiments, as in Table 12 but includes the two new 
valence-quark parameters. One sees that this more 
flexible parametrization does not eliminate the ten- 
sion. Further improvement cannot be obtained by fur- 
ther increasing the flexibility of the parametrization, 



because attempts to do that are foiled by the fits be- 
coming unstable, due to large undetermined parame- 
ters. 

It was possible to shed further light on the source 
of tension in the muon data sets by splitting each set 
into a low-Q and high-Q region. When this was done 
(not shown), it was found that both the low-Q and 
the high-Q portions of each muon experiment are sep- 
arately consistent with the non-muon data. Hence the 
observed tension is generated by the Q-dependence of 
each muon data set. 

This is actually very plausible, because the BCDMS 
data have previously been shown [4l[ to contain a sig- 
nificant "higher-twist" component (non-leading power- 
law dependence in Q) , which is not taken into account 
in the PDF fit. Higher- twist effects can be expected 
to be even more important for the NMC data, since 
more of that data is at small Q. 

To suppress higher-twist contributions — or other 
possible deviations from NLO QCD at low Q — we now 
remove the BCDMS data that lie below the dashed line 
and the NMC data that lie below the solid line in Fig. 
HO (These cuts were chosen roughly based on [4l[ ; they 
have not been optimized.) The resulting fit, which in- 
cludes all four of the muon data sets, is summarized 
by the final group of four lines in Table IIIII With the 
cuts, the large tension has gone away. The x 2 /N for 
the BCDMS experiments is also greatly improved, and 
is now within the normal range. The x 2 /N is still a 
bit high for the NMC pp data, which suggests that 
a somewhat stronger cut would be desirable for that 
data set. 



10. IMPLICATIONS FOR PDF ANALYSIS 

The results presented here support a tolerance cri- 
terion of Ax 2 ~ 10 to estimate the 90% confidence 
range of PDF uncertainties, based solely on the un- 
certainties of the input data. In contrast, PDF deter- 
minations made using the Hessian method Q gener- 
ally include a much broader allowed range of uncer- 
tainty, e.g. Ax 2 = 50 in MRST [H and A X 2 = 100 
for 90% confidence in CTEQ 0. This larger range 
arises from adopting a "hypothesis-testing" criterion 
[39l ]. according to which any PDF configuration that 
provides a satisfactory fit to all of the input data sets is 
deemed acceptable. Loosely speaking, the hypothesis- 
testing criterion is defined by x 2 < N + \/2N for 
each experiment. The overall allowed Ax 2 is therefore 
~ V2-ZVtotai, *- e -> 77 for 3000 data points at 1 a . 

In detail, the hypothesis-testing condition is cor- 
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Expt 


N 


x 2 


(7i. oi ), (72 


, o"a), . . . 


BCDMS F 2 p 


339 


384 


(0.88, 


2.37) 


(0.77, 0.31) (0.57, 3.03) (0.44, 3.54) (0.12, 5.79) 


BCDMS F 2 d 


251 


248 


(0.44, 


1.98) 


(0.31, 0.02) (0.28, 3.25) (0.20, 0.07) (0.18, 0.85) 


NMC F 2 p 


201 


332 


(0.43, 


2.16) 


(0.26, 0.68) (0.21, 5.95) (0.11, 2.94) 


NMC F 2 p/d 


123 


121 


(0.83, 


2.73) 


(0.78, 1.76) (0.77, 2.04) (0.62, 1.80) (0.44, 0.36) (0.26, 0.61) 


BCDMS F 2 p 


339 


365 


(0.89, 


1.42) 


(0.80, 0.71) (0.78, 0.11) (0.61, 2.95) (0.41, 0.02) (0.28, 3.58) 


BCDMS F 2 d 


251 


249 


(0.46, 


2.02) 


(0.42, 3.80) (0.37, 1.49) (0.29, 0.31) (0.21, 1.24) (0.17, 0.19) 


NMC F 2 p 


201 


331 


(0.87, 


1.74) 


(0.57, 2.29) (0.22, 4.92) (0.16, 4.20) 


NMC F 2 p/d 


123 


118 


(0.89, 


3.65) 


(0.80, 2.74) (0.58, 2.52) (0.48, 0.18) (0.17, 0.31) 


BCDMS F 2 p 


339 


365 


(0.68, 


1.50) 


(0.64, 0.98) (0.48, 0.11) (0.35, 2.32) (0.17, 3.48) (0.10, 1.37) 


BCDMS F 2 d 


251 


260 


(0.32, 


1.55) 


(0.28, 0.80) (0.21, 0.81) (0.18, 3.47) (0.15, 1.09) 


NMC F 2 p 


201 


338 


(0.53, 


0.37) 


(0.21, 5.48) (0.15, 4.54) 


NMC F 2 p/d 


123 


119 


(0.66, 


0.88) 


(0.55, 4.05) (0.45, 0.98) (0.31, 0.53) (0.15, 0.11) 


BCDMS F 2 p 


250 


234 


(0.69, 


0.42) 


(0.60, 0.99) (0.46, 0.24) (0.31, 1.11) (0.12, 2.04) 


BCDMS F 2 d 


210 


188 


(0.29, 


1.64) 


(0.26, 2.25) (0.20, 0.91) (0.18, 2.16) (0.16, 2.79) (0.12, 2.29) 


NMC F 2 p 


91 


135 


(0.20, 


2.01) 


(0.11, 1.71) 


NMC F 2 p/d 


71 


64 


(0.59, 


2.59) 


(0.45, 1.87) (0.32, 0.78) (0.22, 1.11) 



TABLE III: Results with 74 > 0.1 for the fip and fid experiments. In the first two groups, only one muon experiment — 
the one listed — is included in the global fit. In the last two groups, all four are included. The first group uses the same 
parametrizations as Table [I] while the other three groups use a parametrization with additional freedom for u v and d v at 
large x. The fourth group includes the additional kinematic cuts shown in Fig. [5] 



rected for finite N for each experiment, and refined 
on the basis of the lowest possible \ 2 that can be 
achieved for that experiment. In the CTEQ fits 0], 
contributions to x 2 from some of the data sets are en- 
hanced by weight factors, which are chosen to keep 
the fits to those experiments adequate over the Xtot < 
+ Ax 2 range. In the most recent CTEQ fit 0], 
that procedure is supplemented by adding a quartic 
penalty term to the effective x 2 j to force the fits to 
some recalcitrant experiments to remain satisfactory 
over the entire region defined by A% 2 - Meanwhile, re- 
cent MSTW fits 0,113 abandon the use of a fixed A% 2 , 
and instead determine the uncertainty limit "dynami- 
cally" along each eigenvector direction (separately for 
"+" and "— " senses), as the point where the fit first 
becomes unacceptable to one of the data sets. 

The hypothesis-testing criterion is a minimal re- 
quirement for acceptable fits. It defines a broader un- 
certainty limit than would be predicted on normal sta- 
tistical grounds, which has been called the "parameter- 
fitting" criterion (39j . The parameter-fitting criterion 
is ideally defined by A X 2 = 1.0 (2.7) for a 68% (90%) 
confidence interval. In view of the results of this pa- 
per, that should be expanded in practice to A\ 2 ~ 10 
for 90% confidence, on the basis of inconsistencies ob- 
served among the implications of different data sets. 

From a statistical point of view, the hypothesis- 



testing criterion appears to be overly conservative. 
That notion is challenged, however, by the apparent 
"time-dependence" and "space-dependence" of the 
PDFs. Namely, we have repeatedly seen changes 
from one generation of PDFs to the next, e.g., 
CTEQ5/CTEQ6.0/CTEQ6.1/CTEQ6.6/CT09 or 
MRST2001/MRST2002 /MRST2004/MSTW2008, for 
which the central estimate for some flavor in a set 
is close to the predicted 90% confidence limit from 
the previous set; and differences between PDF sets 
determined by different groups, such as CTEQ and 
MSTW, are also frequently as large as these broad 
uncertainty estimates. Examples of this can be seen 
in Figs. 8 and 12 of H . 

Some of the time-dependence has resulted from im- 
provements in the theory, such as the better treatment 
of heavy quark mass effects beginning with CTEQ6.6; 
or additions to the available data. But other dif- 
ferences between PDF determinations arise from the 
choices of which data sets to include; in choices of kine- 
matic cuts such as those introduced in Sec.[9]to remove 
data points for which the pcrturbative QCD treatment 
is suspect; and in the choice of parametrizing functions 
at /Zfj . Additional uncertainties are present due to the 
NLO approximations made in the theory. Adopting 
the hypothesis-testing criterion can be seen as an expe- 
dient way to broaden the estimated uncertainty range 
to allow for these uncertainties — although obviously 
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FIG. 7: Uncertainty of valence quark distributions at A\ 2 = 10. The solid shaded region is the fit in Table [I] The dotted 
region is the fit in the last section of Table IIHI The long-dashed and short-dashed curves show u and d respectively for 
comparison. 



they cannot be reliably predicted on the basis of the 
errors in the experimental data sets. 

Quantities that are weakly constrained by the data 
are especially subject to paramctrization dependence. 
A classic example of this is provided by the gluon dis- 
tribution at large x. Prior to measurements of the 
inclusive jet cross section at the Tevatron, there was 
very little information on the gluon at large x. The 
parametrizations used at that time therefore devoted 
very few parameters to the large- a; region, since uncon- 
strained parameters make the fitting procedure unsta- 
ble. When the jet data became available, they were 
found to lie outside the predicted uncertainty range. 
This pointed to a need to introduce additional param- 
eters, which could then be determined by stable fits 
using the new data. 

A further example is shown in Fig. which displays 
the valence quark distributions u v = u — u and d v = 
d — d. Their uncertainty is large for x < 0.01, because 
their contribution from that region to most observables 
is swamped by much larger contributions from u and d, 
which are also shown in the figure. The uncertainties 
are shown for both the original fit (described in Table 
H]) and the final fit (described in the last four lines 
of Table . The final central fit lies far outside the 
uncertainty band estimated in the earlier fit, when that 
uncertainty is computed at A^ 2 = 10. At large x, 
the situation is reversed: valence quarks dominate the 



phenomenology, so u v and d v are very well measured 
there, and the difference between the two fits is small 
and consistent with the estimated uncertainty 

An emerging alternative to the Hessian approach is 
provided by the NNPDF [5[ method, which avoids the 
paramctrization problem by using very flexible Neural 
Network representations instead of functional forms to 
describe the PDFs at /io- An attractive feature of the 
NNPDF approach is that introducing new measure- 
ments reduces the uncertainty of the output PDFs, 
unless the new data are rather inconsistent with the 
previous data. The same cannot be said for the Hes- 
sian approach, because when new data sets are added 
to the global fit, it is often desirable to increase the 
flexibility of the parametrization, as happened with 
the inclusive jet cross section as discussed above. The 
NNPDF method incorporates experimental errors by 
creating an ensemble of fits to "pseudodata" sets in 
which the measured values are displaced by random 
shifts that are proportional to the experimental uncer- 
tainties. It would be interesting to apply the NNPDF 
method to assess the uncertainties that are not associ- 
ated with the experimental errors, by using the original 
mishitted data to produce each element in the ensem- 
ble. The ensemble would then retain the other sources 
of uncertainty due to the other random processes used 
to create it. 

The effective number of parton parameters that 



12 



can be measured by the available data — currently 
around 25 — is small enough that the traditional Hes- 
sian method is convenient. But the number of po- 
tential parameters that could be determined by some 
future experiment, but which are currently uncon- 
strained, is of course very large or even infinite. So 
it is not practical to provide parameters for all such 
potential degrees of freedom. However, the Hessian ap- 
proach may nevertheless be viable for the large range 
of predictions for which PDFs are needed, since the 
processes one wishes to predict depend on similar as- 
pects of the PDFs to the experiments that are used 
to determine them. As an extreme case in point, 
the PDF fits described in this paper admit no un- 
certainty at all in the assumption s(x,fj,) = s(x,/x). 
They can therefore not be used to predict new pro- 
cesses that are sensitive to the strangeness asymmetry 
s^(x, (j,) = s(x,fj,) — s(x,/i); but most processes we 
wish to predict are not in fact sensitive to that asym- 
metry. 



11. CONCLUSION 

The recently-developed DSD method [|| has been 
applied to assess compatibility among the data sets 
that are used to extract parton distribution functions. 
The DSD method is more discerning than the previ- 
ous method @, HH of studying correlations between x 2 
values for the various experiments, because it looks for 
inconsistencies of each experiment along the specific 
directions in parameter space for which that experi- 
ment is significant in the global fit, while ignoring the 
large number of directions along which the experiment 
is unimportant. 

Results from the DSD method, which are shown in 
Table HI can be read as a "report card" on the con- 
tribution of each experiment to the global fit. The 7i 
parameters measure how much each experiment influ- 
ences the fit, while the Ci parameters measure how 
much dissonance each experiment brings with it. 

TableUidentified fixed-target \i-p and fid experiments 
as the greatest source of tension in a recent global fit. 
Further exploration in Sec. [9] revealed the underlying 
cause of that tension as deviations from NLO QCD 
predictions — presumably due to higher twist — which 
had previously been observed in these data [41j . but 
which were not taken into account in the fit. Kine- 
matic cuts shown in Fig. [6] remove the contaminated 
region and eliminate the large discrepancies, as can be 
seen by comparing the last four lines of Table ITTT1 with 
their corresponding entries in Table [U Future global 
fits should make a refined version of these cuts, or else 



introduce additional fitting parameters to model the 
higher-twist contribution. (The latter was attempted 
in CTEQ6 Q, without conclusive results the DSD 
method not being available at that time to make a 
sensitive test of the consistency.) 

Independently of the muon experiments, the impli- 
cations of the various data sets in the global fit are 
found to be somewhat inconsistent with each other 
(Fig. [5]). The average discrepancy is a bit less than a 
factor of 2 larger than what is predicted by straight- 
forward propagation of the experimental errors. This 
was shown in Sec. [7] to suggest that the 90% confi- 
dence limit for predictions from the global fit should 
be estimated by a tolerance criterion of Ax 2 f=a 10, in 
place of the A\ 2 = 2.71 that would be implied by pure 
Gaussian statistics. 

Much larger tolerance criteria (Ax 2 = 50 [12] or 
A% 2 = 100 P, 0]) have been used to estimate the 
90% confidence limit in recent applications of the Hes- 
sian approach. These more conservative tolerance cri- 
teria correspond to the "hypothesis testing" notion 
that any PDF set is acceptable as long as its fit to 
every data set lies in the nominal statistical range 
X 2 = N ± y/2N, or its 90% analog, with appropri- 
ate corrections for finite TV. This implies an effec- 
tive overall Ax t 2 otal ~ V2N totai ~ 75 for 1 a. The 
uncertainty from input data, as assessed in this pa- 
per by studying its mutual consistency, does not call 
for this expanded uncertainty range. However, some 
aspects of the fit do no doubt require an expanded 
uncertainty estimate, because of theoretical system- 
atic errors — most notably the use of NLO perturbation 
theory and parametrization dependence. The need for 
some such expanded uncertainty is demonstrated by 
the relatively large changes in uncertainty bands that 
can be caused by relatively minor changes in the choice 
of parametrization or in the choice of data sets that are 
included. Examples of this are provided by the valence 
quark distributions at small discussed in Sec. [§] 
and the gluon distribution, as discussed in 

In the future, it would be desirable to estimate the 
uncertainties associated with parametrization choices 
and other theoretical errors directly, rather than using 
a large Ax 2 to stand in for them in a manner that is 
based artificially on the uncertainties of the data. If 
this can be accomplished, the result will likely expand 
the estimated uncertainty range for quantities that are 
poorly constrained; but it may reduce the uncertainty 
for quantities that are well constrained, because of the 
reduction in Ax 2 - From the ratio of Ax 2 values, one 
might hope to find the uncertainty reduced by as much 
as a factor of 3; but the actual reduction will probably 
be less than that, because x 2 generally rises faster than 
quadratic for large displacements from the best fit. 
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Appendix 

A formal derivation of the DSD method was pre- 
sented in and it is reviewed in Sec. [H To assist in 
understanding the method, this Appendix illustrates 
it by a simple explicit example. 

Suppose we have a theory that predicts a linear re- 
lationship y = ax + b where a and b are unknown 
parameters. Further suppose there are three experi- 
ments, which have measured 



Expt 1 
Expt 2 
Expt 3 



y = yi ± 1 at x = — 1 
y = y2 ± 1 at x = 
y = y 3 ± 1 at x = 1 . (25) 



The fit to these three experiments is described by 



where 



Xi 



xi 



2 i 2 i 2 
Xl + X2 + X3 



Vi - (-a + b) 



(26) 



1)2 



1 

(b) 



y 3 - (a + b) 



1 



(27) 



It is natural to replace the theory parameters a and 
b by new parameters u\ and U2 that are measured 
from the minimum point in x 2 an d normalized in the 
standard Hessian way: 



K = ( Vl + y 3 ~ 2y 2 )/V6 ■ 



(30) 



The transformation (f28|) that yields (|29|) contains a 
shift and a rescaling of the original fitting parameters 
a and b. In general, it also requires a rotation (orthog- 
onal transformation) that intermingles those variables; 
but that was not necessary in this simple example be- 
cause of symmetry The uncertainty on the theory 
parameters a and b in the global fit can now be ob- 
tained easily from Eq. (|29p . which implies that the 1 a 
limits arc u\ = ± 1 and U2 — ± 1. 

In order to examine the internal consistency of this 
fit, we must consider the contributions to x 2 from the 
individual experiments. These can be expressed in 
terms of the new coordinates as 



Ml 

Xi = I ~lK - -E + -757 



xi 



x 3 



V2 

U2_ 

"i 



U2_ K_ 

V3 V6 

l'2 



K 



V% V6 



(31) 



To study the consistency between Expt 1 and its com- 
plement, it is necessary to make a further coordinate 
transformation to diagonalize Xi ■ That transforma- 
tion can be found by the DSD method; or in this sim- 
ple case, by inspection: 




(32) 



This gives 



Xi 



xi 



xl 



xl 



X =Xi 
From this, one easily reads 




(33) 



Ml 

a = —= 
V2 

b = - F 

\/3 



2/3 - 2/i 



2/i + 2/2 + 2/3 



This puts x 2 hito the standard diagonal form 



K 



(28) 



(29) 



Expt 1: mi = - ^/\JhK ± ^f&jb 
Expt T: mi = + Vh K ± VQ ■ 



(34) 



Subtracting these two measurements and combining 
their errors in quadrature shows that they differ by 
^36/5 K ± t/36/5. This differs from by K stan- 
dard deviations, which is the measure of consistency 
between Expt 1 and its complement. 
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The contribution to \ 2 from Expt 2 in Eq. (f3Tj) hap- 
pens to be already in the diagonal form that is the 
heart of the DSD method, so it requires no further 
transformation: 



X2 



Xl + X 3 



From this, one reads 




(35) 



Expt 2: v,2 = 
Expt 2: i*2 = 



-V2K ± \/3 

+ ^/T/2K ± a/372 



(36) 



Subtracting these results and combining their errors in 
quadrature shows that the measurement of 112 by Expt 
2 and its complement differ by 9 / 2 K ± a/9/2. This 
difference is also if standard deviations away from . 



Because there are only three data points in this ex- 
ample, with two free parameters in the theory, there 
is only one possible test of the internal consistency. 
That is why both Expt 1 and Expt 2 show the same 
discrepancy K, when the discrepancy is measured in 
standard deviations. To show that Expt 3 would also 
give the same result is left as an exercise for the reader! 

In this simple example, the consistency measure can 
also be found by elementary means: adding the errors 
from (|25p in quadrature gives ± -\/6 for the uncertainty 
of 2/1+2/3—21/2, and hence the uncertainty of K is ±1 by 
Eq. ([50]) . Meanwhile, the theoretical prediction for K 
is 0, since the theory predicts y to be a linear function 
of x, and y\, 2/2, 2/3 are measured symmetrically at 
x = —1,0,1. Hence the difference between theory and 
experiment is K ± 1, so K is indeed the discrepancy 
measured in standard deviations. 
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